home *** CD-ROM | disk | FTP | other *** search
-
- Forthmacs Implementation
- ************************
-
- This chapter describes how RISC OS Forthmacs implements the Forth
- virtual machine on the ARM processors. It assumes that you have a
- fairly good knowledge of conventional Forth implementations; it does
- not attempt to be a tutorial on how Forth works.
-
-
- Dialect
- =======
-
- RISC OS Forthmacs is an implementation of the Forth-83 standard, with
- a few exceptions. It is a descendent of the public-domain F83
- implementation by Laxen and Perry and contains most of the F83
- extensions as well as many new ones. It is compatible with the other
- implementations for Sun-68k, Sparc, Atari, Macintosh and OS-9
- computers.
-
-
- Stack Width and Addressing
- ==========================
-
- RISC OS Forthmacs deviates from the 83 Standard in the width of the
- stack. Forth 83 specifies that stack items are 16-bit numbers, and
- that the address space is 64K. This wastes much of the power of
- modern CPUs and is almost impossible to implement on ARM based
- computers.
-
- In RISC OS Forthmacs, all stack items as well as memory cells are
- 32-bit wide, remember this when writing portable programs.
-
- The address could conceivably grow to 2 to the 32nd power (4
- gigabytes), but this is restricted by the current CPU/MMU versions to
- 16 MBytes. 16-bit or 2-byte memory accesses are not supported any
- longer and must be emulated if necessary.
-
- Note: word accesses are simulated by two byte accesses, take care
- about interrupts occurring here!
-
- The current ARM MMUs don't support non-aligned memory accesses. NOTE:
- They don't abort or run any exception vector but just do something not
- clearly defined and CPU dependent. Take care of this, it took me
- hours to find a bug!
-
- All accesses must be one of:
-
- 1) byte-wide access to any address in the address area
-
- 2) cell-wide ( 32-bit ) access to any aligned address
-
-
- Both stacks are pre-decrementing/post-incrementing. The parameter
- stack holds its top-of-stack in the top-register r10, this allows much
- faster code definitions because of the CPUs load-and-store
- architecture.
-
-
- Register Usage
- ==============
-
-
- r7 floating stack pointer fsp
- r8 instruction pointer ip
- r9 user area pointer up
- r10 top-of-stack register top
- r11 returnstack pointer rp
- r12 RISC OS frame pointer fp never use this!
- r13 stack pointer sp
- r14 link register lk
- r15 pc + status + flags pc
- r15 sr sr hold the flags part of r15
-
- Note: In future CPU Versions the internal structure of the pc-register
- might be different, it seems to be better, to imagine pc and status
- register as two registers. The hardware-errors and the .REGISTERS
- instruction know about this.
-
- r0, r1, r2, r3, r4, r5, and r6 are available for use within code
- definitions. Don't try to use them for permanent storage, because
- they are used by many code words with no attempt to preserve the
- previous contents.
-
- Registers r7-r11 and r14 can be used within code definitions, but you
- have to save and restore their values at the beginning/end of the
- definition. r12, r13 and r15 should not be used.
-
-
- Inner <address> Interpreter
- ===========================
-
- The inner interpreter NEXT is direct threaded, post incrementing. The
- compilation address of all definitions contain machine code to be
- executed, not a pointer. Each CODE definition ends with the NEXT
- code, assembled in-line. The NEXT code is:
- pc ip )+ ldr
- This means: Load the program-counter PC ( don't affect the CPU status
- ) from the 4-byte cell pointed to by the instruction pointer ip,
- postincrement the instruction-pointer. So the NEXT is only one CPU
- instruction and very fast. It is much faster than
- address dolink branch
- ...
- pc link mov
- constructions because of only one pipeline reload per NEXT. But on the
- other hand, there is definitely a larger overhead for calling
- secondaries.
-
- For discussions about subroutine threaded ( macro extended ) versus
- threaded code implementations see the Forth literature. Generally,
- macros do bring some advantage in execution speed but give less
- information about the code itself, so debuggers are less useful. The
- penalty for direct threaded code is hard to predict, it depends very
- much on the type of application. Something like 50% sounds
- reasonable, so optimising the bottlenecks could bring big advantages.
- The 'runtimer ' utilities might help you doing this.
-
- The assembler macro C; assembles the NEXT instruction and ends
- assembling by END-CODE. A fast conditional next can be done by
- ...
- r2 0 cmp
- eq next
- ...
-
-
- Other Definitions
- =================
-
- Any word that is not a code definition contains a branch+link
- instruction at the code-field, this makes a relative branch to an
- inline-address and saves the pc+sr to the lk register.
- runtime-addr dolink branch
- The inline address points to a code fragment (headerless in most
- cases) that implements the run-time action of the word. The parameter
- field starts just after this branch+link instruction and can be found
- by clearing the flags in the link register like this:
- r0 lk th fc000003 # bic
- r0 get-link
-
- The run-time codes may have to push the top-register to the stack,
- save the return pointer to the return-stack and set the instruction or
- stack pointer to the parameter field address. All standard runtime
- codes (those of variables, constants, colon definitions, user
- variables ...) have been optimized for best cache-hit rates on ARM3/6
- machines.
-
- Note: WORD-TYPE ( cfa -- addr ) finds the address of the words runtime
- code in this implementation.
-
-
- Colon definitions
- =================
-
- The runtime code:
- mlabel docolon assembler
- ip rp push
- ip get-link c;
- The body of a Colon Definition starts 4 bytes after the compilation
- address. The body contains a list of compilation addresses of other
- words. Each such compilation address is a 32-bit number which is an
- absolute address.
-
-
- Variable
- ========
-
- The Parameter Field of a VARIABLE contains a 32-bit number which is
- the value of the variable. The runtime code:
- mlabel dovariable assembler
- top sp push
- top get-link c;
-
-
- Constants
- =========
-
- The Parameter Field of a CONSTANT contains the 32-bit value of the
- constant. The runtime code:
- mlabel doconstant assembler
- top sp push
- r0 get-link
- top r0 ) ldr c;
-
-
- User Variables
- ==============
-
- The value of a USER variable is stored in the USER area as a 32-bit
- number. The Parameter Field of a user variable contains a 32-bit
- offset into the user area of the current task. r8 contains the base
- address of the current user area. r8 is symbolically defined as UP in
- the assembler. The runtime code:
- mlabel douser assembler
- top sp push
- r0 get-link
- r0 r0 ) ldr
- top r0 up add c;
-
-
- Deferred words
- ==============
-
- The compilation address of the word to be executed by a DEFER word is
- stored as a 32-bit absolute address in the USER area. The Parameter
- Field of a deferred word contains a 32-bit number which is an offset
- into the user area of the current task. The runtime code:
- mlabel dodefer assembler
- r0 get-link
- r0 r0 ) ldr
- pc r0 up ib ldr end-code
- The last line holds a somewhat optimized NEXT instruction, it means:
- Load the pc from the address in the user area with the offset r0.
-
-
- ;code
- =====
-
- The compilation address of a word created by a CREATE ... ;CODE data
- type construction contains the standard branch+link instruction that
- branches to the runtime code.
-
- The runtime code is defined by the programmer in the ;CODE part of the
- definition. At first the ;CODE instruction assembles
- top sp push
- top get-link
- before the programmers code. This is mainly for convenience, so the
- top is already saved to the stack and points to the parameter field.
- The programmer might do '-2 cells allot ' to forget this for speed
- optimized code.
-
-
- does>
- =====
-
-
- mlabel dodoes assembler
- ip rp push
- ip get-link c;
- The runtime code is defined by the programmer in the DOES> part of the
- definition. Before branching to the dodoes code, the does>
- instruction assembles
- top sp push
- top lk th fc000003 # bic
- to get the parameter field address.
-
-
- local variables
- ===============
-
- RISC OS Forthmacs has built in ANS Forth conforming local variables
- spending their lifetime on the return-stack in stack-frames. The
- stack-frames are linked via a USER variable LOCAL-FRAME which is also
- used to locate a local variables value. The frame structure is like:
- | cfa:frame> | old-frame | old-rs | loc | loc | .........
- with cfa:pop-frame on top of the return-stack. pop-frame removes the
- current frame and switches to the last frame.
- headerless code pop-frame \ this routine is pushed on return stack by push-locals
- here-t /token-t + token,-t
- r0 rp 2 rp ia ldm
- r0 'user local-frame str
- ip rp pop c;
-
- The local variables are accessed using (loc) followed by an stack
- frame index.
- code (loc) \ ( -- n ) runtime-code of any local
- r0 'user local-frame ldr
- r1 ip )+ ldr
- top sp push
- top r0 r1 2 #asl db ldr c;
-
- Note: The disassembler can NOT know the local variables names, so it
- assumes names like V0 V1 ... .
-
-
- Vocabularies
- ============
-
- Each VOCABULARY has #THREADS ( currently 16 ) 32-bit pointers which
- are called "threads". A thread is the head of a linked list of words.
- A hashing function selects which of the 16 linked lists a particular
- word belongs to. The threads are stored in the USER area. The
- Parameter Field of a VOCABULARY contains the 32-bit offset of the
- threads in the USER area, followed by the vocabulary-link, a 32-bit
- pointer to the previous VOCABULARY. The runtime high-level code is:
- does> body> context token!
-
-
- Tokens
- ======
-
- Within the body of a colon definition, calls to other Forth words are
- compiled as the 32-bit absolute compilation address of those words.
- These tokens have a corresponding bit in the relocation table.
-
-
- Branching
- =========
-
- Branch targets are offsets relative to the location that contains the
- branch offset. They are stored as 32-bit twos-complement numbers
- representing the number of bytes between the offset location and the
- branch target. For example, a branch to the following location could
- be compiled with:
-
- postpone branch 4 ,
-
-
- Doubles
- =======
-
- RISC OS Forthmacs versions newer than 1.83 have full double number
- support, all conversion tools CONVERT, NUMBER?, D. use doubles, the
- 'scaling' words */ */MOD UM/MOD use double intermediate results.
-
- Also the text-interpreter and compiler accept literals as doubles when
- there is a period in it.
- : test 1234. d. ;
- 1234. is a double number and D. displays it.
-
- This could only be achieved with changing stack effects in a number of
- words. So these new RISC OS Forthmacs versions are no longer
- compatible when these words are used. The lib.compatible tool does
- NOT cover these changes.
-
- The advantage of the new stack behaviour is it's ANS compliancy and
- the improved arithmetic capabilities.
-
-
- Header format - # of bytes in parentheses
- =========================================
-
- Source Field (4), Link Field (4), Name Field (n), Padding (0 to 3),
- Flags (1), Code Field (4), Parameter Field (n).
-
- As all addresses need to be, the Link Field, Name Field, and Code
- Field are all aligned.
-
- Links point to links ( NOT to Name Fields, as in FIG Forth! )
-
- The name field is a normal Forth packed string. (Many Forth
- implementations set the high bit in the first and last characters of
- the name field; RISC OS Forthmacs does not).
-
- Name Field: length-byte, 0-31 character name.
-
-
-
- Vocabulary Format
- =================
-
- Vocabularies have #THREADS- way hashing. This means that each
- vocabulary has 16 separate linked lists of words. Before searching a
- vocabulary, a hashing function is applied to the name to be located.
- The hashing function selects one of the 16 linked lists to search.
-
- The hashing function is very simple. The lower 4 bits of the first
- character in the name (the first name character, not the length byte)
- are interpreted as a number from 0 to 15, selecting a linked list.
-
- Vocabularies are not chained to one another. Search order is
- implemented using the ALSO / ONLY scheme. Each vocabulary thread is
- terminated with a special link field in the final word. The special
- link address is the address of the origin of the Forth system (which
- may change from session to session due to the relocation that the
- operating system applies when loading and executing the Forth system.
-
- The parameter field for a vocabulary looks like:
-
- User number (/cell), Voc-link (/cell)
-
- The user number selects the place in the user area where the head of
- list pointers for the 16 vocabulary threads are stored. Each
- vocabulary requires 16 CELLS bytes of user area storage for these 16
- threads. The values stored in the user area are the Link field
- Addresses for the top word in each thread.
-
-
- Relocation
- ==========
-
- In the RISC OS environment all programs of the absolute type are
- loaded at $8000 and executed from there. So on first sight the
- relocation table doesn't make much sense in this version.
-
- But the relocation table can be used for target/meta-compiling or for
- relocating code during run-time. This is necessary for producing
- turnkey applications with an 'Application Stripper', use of the
- application stripper requires strict adherence to the rules of
- relocatability.
-
- If the program is not relocatable, then a file saved with SAVE-FORTH
- will work only if it is later executed at the same address where it
- was executing when it was saved. If saved with SAVE-TURNKEY, a
- program that is not relocatable will not work at all, regardless of
- the address where it is later executed. Consequently, use of the
- application stripper requires strict adherence to the rules of
- relocatability.
-
- In most cases, the relocation bitmap is maintained automatically,
- without requiring any special effort on the part of the programmer.
- However, there are some cases where the programmer must take explicit
- actions to ensure that the program is relocatable.
-
- The executable file contains a relocation list used to identify the
- locations in the program's binary image which contain absolute
- addresses. When the program is loaded, each of these locations is
- modified by adding the starting address of the program to the number
- contained in that location. Only 32-bit numbers may be so modified.
-
- While RISC OS Forthmacs is running, it maintains its own relocation
- table, identifying those locations in the Forth dictionary which must
- be relocated during COLD-CODE. Each bit in the map represents the
- address of one aligned location. This relocation table is completely
- different from the standard RISC OS relocation tables, it is only used
- from within RISC OS Forthmacs.
-
- In order for this to work properly, the programmer must be careful to
- use TOKEN, A, LINK, TOKEN!, A! or LINK! to store an address or token
- in the dictionary, all six set the relocation flags.
-
- Addresses may be stored into VARIABLEs with ! (without requiring the
- use of TOKEN! ) if the VARIABLE is re-initialized every time that the
- application is started. TOKEN! is only necessary if the VARIABLEs
- value must be set before SAVE-FORTH is executed, and then is used when
- the saved application is later invoked, without being re-initialized
- by the application's initialization code.
-
- If , or ! is used instead, the address will not be properly relocated
- if SAVE-FORTH has been used to write the dictionary image to an
- executable file.
-
- Note: The lib/checkrel.fth program can help you catch relocation
- problems in your applications. It should be loaded before you load
- your application, and will warn you if your application does things
- that may not be relocatable. After you have fixed the relocation
- problems, you can load your application without lib/checkrel.fth .
- Also .BUFFERS
-
-
- See: .BUFFERS .POINTERS TOKEN! TOKEN, A! A, LINK! LINK,
- SET-RELOCATION-BIT RELOCATION-MAP
-
-
- Program header
- ==============
-
- The header of the executable binary image looks like this:
-
- h_magic ( 0) \ Magic Number
- h_tlen ( 4) \ length of text (code)
- h_dlen ( 8) \ length of initialised data
- h_blen ( c) \ length of BSS unitialised data
- h_slen ( 10) \ length of symbol table
- h_entry ( 14) \ Entry address
- h_trlen ( 18) \ Text Relocation Table length
- h_drlen ( 1c) \ Data Relocation Table length
-
- the magic number is the branch+link instruction just behind this
- header. Note: this header might be changed with future releases
- according to Acorns executable binary code standard.
-
-
- Heap memory
- ===========
-
- RISC OS Forthmacs is loaded to $8000 and will have as much memory
- available as was defined by 'WimpSlot' .
-
- The MAIN-TASKs user area immediately follows the first instructions at
- $8050.
-
- $600 byte will be allocated in module-heap, it will hold the ENV-AREA,
- the command-line area, the INTERRUPT-CODE giving GET-TICKS plus all
- handlers used by shelled programs.
-
- The implementation of the dynamic memory manager has changed in
- Version 3.1-2.00. From now on the dictionary and the heap share the
- same memory area, the dictionary grows from lower addresses and the
- heap can be as large as the area between the stacks and HERE.
-
- Note: Of course you may install another memory manager or add more
- heaps.
-
-
- Dictionary memory
- =================
-
- At the top of the dictionary are both stacks defined by RP0 - RS-SIZE
- and SP0 - PS-SIZE and the TIB, below this are MBytes of free memory
- (well, hopefully). HERE marks the end of the allocated dictionary,
- classically PAD is HERE plus something.
-
- RISC OS Forthmacs knows about two dictionary areas, the RESIDENT
- (which is the dictionary you know in all implementations) and the
- TRANSIENT. The transient dictionary is in the heap memory, definitions
- defined here won't use dictionary space in the target application. So
- it might be useful to do:
- transient
- fload assembler
- fload debugger
- resident
- fload myapplication
- Now the debugger and assembler will be in transient address space. To
- remove all links, pointers etc. into the transient address space use
- DISPOSE, it will do this for you. .DISPOSE will also give some
- informations what is removed.
-
-